Subjectively Interesting Alternative Clusters
نویسنده
چکیده
We deploy a recently proposed framework for mining subjectively interesting patterns from data [3] to the problem of clustering, where patterns are clusters in the data. This framework outlines how subjective interestingness of patterns (here, clusters) can be quantified using sound information theoretic concepts. We demonstrate how it motivates a new objective function quantifying the interestingness of (a set of) clusters, automatically accounting for a user’s prior beliefs and for redundancies between the clusters. Directly searching for the optimal set of clusters defined in this way is hard. However, the optimization problem can be solved to a provably good approximation if clusters are generated iteratively, paralleling the iterative data mining setting discussed in [3]. In this iterative scheme, each subsequent cluster is maximally interesting given the previously generated ones, automatically trading off interestingness with non-redundancy. Thus, this implementation of the clustering approach can be regarded as a method for alternative clustering. Although generating each cluster in an iterative fashion is computationally hard as well, we develop an approximation technique similar to spectral clustering algorithms. We end with a few visual demonstrations of the alternative clustering approach to artificial datasets.
منابع مشابه
Cluster Generation and Scheduling for Instruction (L0) Clusters
Clustered L0 buffers are an interesting alternative to reduce energy consumption in the instruction memory hierarchy of embedded VLIW processors. Currently, the synthesis of L0 clusters is performed as an hardware optimization, where the compiler generates a schedule and based on the given schedule L0 clusters are generated. Since, the result of the clustering depends on the given schedule, it ...
متن کاملA Generalized Framework for Revealing Analogous Themes across Related Topics
This work addresses the task of identifying thematic correspondences across subcorpora focused on different topics. We introduce an unsupervised algorithmic framework based on distributional data clustering, which generalizes previous initial works on this task. The empirical results reveal interesting commonalities of different religions. We evaluate the results through measuring the overlap o...
متن کاملDensity Functional Study on Stability and Structural Properties of Cu n clusters
In this research DFT/B3LYP method has been employed to investigate the geometrical structures,relative stabilities, and electronic properties of Cun (n=3–10) clusters for clarifying the effect of sizeon the properties. Through a careful analysis of the successive binding energies, second-orderdifference of energy and the highest occupied-lowest unoccupied molecular orbital energy gaps as afunct...
متن کاملUtility of Gambling When Events Are Valued: An Application of Inset Entropy∗
The present theory leads to a set of subjective weights such that the utility of an uncertain alternative (gamble) is partitioned into three terms involving those weights — a conventional subjectively weighted utility function over pure consequences, a subjectively weighted value function over events, and a subjectively weighted function of the subjective weights. Under several assumptions, thi...
متن کاملارزیابی تأثیر جریان الکتریکی بر ریزساختار و رفتار سایشی آلیاژ پایه آلومینیم ریختگی
Employing direct and alternative electric currents at the time of casting and solidification modified grains of Al and Si. The highest wear resistance was obtained in the direct current, and for alternative current the wear resistance corresponded to the electric current. The change of polarity in the pure Al did not influence the wear resistance, but for the Al-Si alloy the highest wear resist...
متن کامل